RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern Mining
نویسندگان
چکیده
Risk patterns are crucial in biomedical research and have served as an important factor precision health disease prevention. Despite recent development parallel high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such redundant candidate generation, inability to discover long significant patterns, prolonged post filtering. In this article, we propose a novel dynamic tree structure, Hierarchical Pattern Tree (RHPTree), top-down search method, RHPSearch, which capable of efficiently analyzing large volume data overcoming the limitations previous works. The nature RHPTree avoids costly reconstruction for iterative process dataset updates. We also introduce two specialized methods, extended target (RHPSearch-TS) approach (RHPSearch-SD), further speed up retrieval certain items interest. Experiments on both UCI machine learning datasets sampled Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) demonstrate that our method is not only faster but more effective identifying comprehensive than Moreover, proposed new structure generic applicable other problems.
منابع مشابه
Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining
Mining frequent tree patterns is an important research problems with broad applications in bioinformatics, digital library, e-commerce, and so on. Previous studies highly suggested that pattern-growth methods are efficient in frequent pattern mining. In this paper, we systematically develop the pattern growth methods for mining frequent tree patterns. Two algorithms, Chopper and XSpanner, are d...
متن کاملHYPE: Mining Hierarchical Sequential Pattern Mining
Mining data warehouses is still an open problem as few approaches really take into account the specifities of this framework (e.g. multidimensionnality, hierarchies, historized data). Multidimensional sequential patterns have been studied. However, they do not provide any way to handle hierarchies. In this paper, we propose an original method of extraction of sequential patterns taking into acc...
متن کاملConstraint-based Tree Pattern Mining
Most work on pattern mining focus on simple data structures like itemsets or sequences of itemsets. However, a lot of recent applications dealing with complex data like chemical compounds, protein structure, XML and Web Log databases, social network, require much more sophisticated data structures (trees or graphs) for their specification. Here, interesting patterns involve not only frequent ob...
متن کاملTree pattern mining with tree automata constraints
Most work on pattern mining focuses on simple data structures such as itemsets and sequences of itemsets. However, a lot of recent applications dealing with complex data like chemical compounds, protein structures, XML and Web log databases and social networks, require much more sophisticated data structures such as trees and graphs. In these contexts, interesting patterns involve not only freq...
متن کاملFrequent Pattern Mining using CATSIM Tree
Efficient algorithms to discover frequent patterns are essential in data mining research. Frequent pattern mining is emerging as powerful tool for many business applications such as e-commerce, recommender systems and supply chain management and group decision support systems to name a few. Several effective data structures, such as two-dimensional arrays, graphs, trees and tries have been prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data
سال: 2022
ISSN: ['1556-472X', '1556-4681']
DOI: https://doi.org/10.1145/3488380